AITopics | user action

Collaborating Authors

user action

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

From Predictions to Decisions: Using Lookahead Regularization

Neural Information Processing SystemsDec-23-2025, 21:38:27 GMT

Machine learning is a powerful tool for predicting human-related outcomes, from creditworthiness to heart attack risks. But when deployed transparently, learned models also affect how users act in order to improve outcomes. The standard approach to learning predictive models is agnostic to induced user actions and provides no guarantees as to the effect of actions. We provide a framework for learning predictors that are accurate, while also considering interactions between the learned model and user decisions. For this, we introduce look-ahead regularization which, by anticipating user actions, encourages predictive models to also induce actions that improve outcomes. This regularization carefully tailors the uncertainty estimates that govern confidence in this improvement to the distribution of model-induced actions. We report the results of experiments on real and synthetic data that show the effectiveness of this approach.

lookahead regularization, name change, prediction, (6 more...)

Neural Information Processing Systems

Technology:

Information Technology > Modeling & Simulation (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Social-Media Based Personas Challenge: Hybrid Prediction of Common and Rare User Actions on Bluesky

White, Benjamin, Shimorina, Anastasia

arXiv.org Artificial IntelligenceNov-24-2025

Understanding and predicting user behavior on social media platforms is crucial for content recommendation and platform design. While existing approaches focus primarily on common actions like retweeting and liking, the prediction of rare but significant behaviors remains largely unexplored. This paper presents a hybrid methodology for social media user behavior prediction that addresses both frequent and infrequent actions across a diverse action vocabulary. We evaluate our approach on a large-scale Bluesky dataset containing 6.4 million conversation threads spanning 12 distinct user actions across 25 persona clusters. Our methodology combines four complementary approaches: (i) a lookup database system based on historical response patterns; (ii) persona-specific LightGBM models with engineered temporal and semantic features for common actions; (iii) a specialized hybrid neural architecture fusing textual and temporal representations for rare action classification; and (iv) generation of text replies. Our persona-specific models achieve an average macro F1-score of 0.64 for common action prediction, while our rare action classifier achieves 0.56 macro F1-score across 10 rare actions. These results demonstrate that effective social media behavior prediction requires tailored modeling strategies recognizing fundamental differences between action types. Our approach achieved first place in the SocialSim: Social-Media Based Personas challenge organized at the Social Simulation with LLMs workshop at COLM 2025.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2511.17241

Country: North America > Canada (0.28)

Genre: Research Report > New Finding (0.88)

Industry: Government (0.93)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)

Add feedback

SODBench: A Large Language Model Approach to Documenting Spreadsheet Operations

Indika, Amila, Molybog, Igor

arXiv.org Artificial IntelligenceOct-24-2025

Numerous knowledge workers utilize spreadsheets in business, accounting, and finance. However, a lack of systematic documentation methods for spreadsheets hinders automation, collaboration, and knowledge transfer, which risks the loss of crucial institutional knowledge. This paper introduces Spreadsheet Operations Documentation (SOD), an AI task that involves generating human-readable explanations from spreadsheet operations. Many previous studies have utilized Large Language Models (LLMs) for generating spreadsheet manipulation code; however, translating that code into natural language for SOD is a less-explored area. To address this, we present a benchmark of 111 spreadsheet manipulation code snippets, each paired with a corresponding natural language summary. We evaluate five LLMs, GPT-4o, GPT-4o-mini, LLaMA-3.3-70B, Mixtral-8x7B, and Gemma2-9B, using BLEU, GLEU, ROUGE-L, and METEOR metrics. Our findings suggest that LLMs can generate accurate spreadsheet documentation, making SOD a feasible prerequisite step toward enhancing reproducibility, maintainability, and collaborative workflows in spreadsheets, although there are challenges that need to be addressed.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2510.19864

Country: North America > United States (0.14)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Taxonomy of User Needs and Actions

Shelby, Renee, Diaz, Fernando, Prabhakaran, Vinodkumar

arXiv.org Artificial IntelligenceOct-13-2025

The growing ubiquity of conversational AI highlights the need for frameworks that capture not only users' instrumental goals but also the situated, adaptive, and social practices through which they achieve them. Existing taxonomies of conversational behavior either overgeneralize, remain domain-specific, or reduce interactions to narrow dialogue functions. To address this gap, we introduce the Taxonomy of User Needs and Actions (TUNA), an empirically grounded framework developed through iterative qualitative analysis of 1193 human-AI conversations, supplemented by theoretical review and validation across diverse contexts. TUNA organizes user actions into a three-level hierarchy encompassing behaviors associated with information seeking, synthesis, procedural guidance, content creation, social interaction, and meta-conversation. By centering user agency and appropriation practices, TUNA enables multi-scale evaluation, supports policy harmonization across products, and provides a backbone for layering domain-specific taxonomies. This work contributes a systematic vocabulary for describing AI use, advancing both scholarly understanding and practical design of safer, more responsive, and more accountable conversational systems.

data mining, large language model, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2510.06124

Country:

Asia > Middle East (0.92)
North America > United States > Massachusetts (0.67)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.28)
(2 more...)

Genre:

Overview (0.92)
Personal > Interview (0.46)

Industry:

Education (1.00)
Government (0.92)
Law (0.67)
(3 more...)

Technology:

Information Technology > Human Computer Interaction > Interfaces (1.00)
Information Technology > Data Science > Data Mining (1.00)
Information Technology > Communications (1.00)
(7 more...)

Add feedback

UISim: An Interactive Image-Based UI Simulator for Dynamic Mobile Environments

Xiang, Jiannan, Zhu, Yun, Shu, Lei, Wang, Maria, Yu, Lijun, Barcik, Gabriel, Lyon, James, Sunkara, Srinivas, Chen, Jindong

arXiv.org Artificial IntelligenceSep-29-2025

Developing and testing user interfaces (UIs) and training AI agents to interact with them are challenging due to the dynamic and diverse nature of real-world mobile environments. Existing methods often rely on cumbersome physical devices or limited static analysis of screenshots, which hinders scalable testing and the development of intelligent UI agents. We introduce UISim, a novel image-based UI simulator that offers a dynamic and interactive platform for exploring mobile phone environments purely from screen images. Our system employs a two-stage method: given an initial phone screen image and a user action, it first predicts the abstract layout of the next UI state, then synthesizes a new, visually consistent image based on this predicted layout. This approach enables the realistic simulation of UI transitions. UISim provides immediate practical benefits for UI testing, rapid prototyping, and synthetic data generation. Furthermore, its interactive capabilities pave the way for advanced applications, such as UI navigation task planning for AI agents. Our experimental results show that UISim outperforms end-to-end UI generation baselines in generating realistic and coherent subsequent UI states, highlighting its fidelity and potential to streamline UI development and enhance AI agent training.

large language model, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2509.21733

Genre: Research Report > New Finding (0.48)

Technology:

Information Technology > Human Computer Interaction > Interfaces (1.00)
Information Technology > Communications > Mobile (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
(3 more...)

Add feedback

Small Models, Big Results: Achieving Superior Intent Extraction through Decomposition

Cohen, Danielle, Halpern, Yoni, Kahlon, Noam, Oren, Joel, Berkovitch, Omri, Caduri, Sapir, Dagan, Ido, Efros, Anatoly

arXiv.org Artificial IntelligenceSep-17-2025

Understanding user intents from UI interaction trajectories remains a challenging, yet crucial, frontier in intelligent agent development. While massive, datacenter-based, multi-modal large language models (MLLMs) possess greater capacity to handle the complexities of such sequences, smaller models which can run on-device to provide a privacy-preserving, low-cost, and low-latency user experience, struggle with accurate intent inference. We address these limitations by introducing a novel decomposed approach: first, we perform structured interaction summarization, capturing key information from each user action. Second, we perform intent extraction using a fine-tuned model operating on the aggregated summaries. This method improves intent understanding in resource-constrained models, even surpassing the base performance of large MLLMs.

large language model, machine learning, trajectory, (18 more...)

arXiv.org Artificial Intelligence

2509.12423

Country: North America > United States (0.28)

Genre: Research Report > New Finding (0.93)

Industry: Consumer Products & Services (0.46)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.70)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.54)

Add feedback

Generalized Delayed Feedback Model with Post-Click Information in Recommender Systems

Neural Information Processing SystemsAug-17-2025, 11:06:03 GMT

Predicting conversion rate (e.g., the probability that a user will purchase an item) is

artificial intelligence, information, machine learning, (17 more...)

Neural Information Processing Systems

Country:

Asia > China > Jiangsu Province > Nanjing (0.04)
Asia > China > Heilongjiang Province > Daqing (0.04)

Industry: Information Technology > Services (0.47)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (0.41)

Add feedback

ViMo: A Generative Visual GUI World Model for App Agents

Luo, Dezhao, Tang, Bohan, Li, Kang, Papoudakis, Georgios, Song, Jifei, Gong, Shaogang, Hao, Jianye, Wang, Jun, Shao, Kun

arXiv.org Artificial IntelligenceMay-21-2025

App agents, which autonomously operate mobile Apps through Graphical User Interfaces (GUIs), have gained significant interest in real-world applications. Yet, they often struggle with long-horizon planning, failing to find the optimal actions for complex tasks with longer steps. To address this, world models are used to predict the next GUI observation based on user actions, enabling more effective agent planning. However, existing world models primarily focus on generating only textual descriptions, lacking essential visual details. To fill this gap, we propose ViMo, the first visual world model designed to generate future App observations as images. For the challenge of generating text in image patches, where even minor pixel errors can distort readability, we decompose GUI generation into graphic and text content generation. We propose a novel data representation, the Symbolic Text Representation~(STR) to overlay text content with symbolic placeholders while preserving graphics. With this design, ViMo employs a STR Predictor to predict future GUIs' graphics and a GUI-text Predictor for generating the corresponding text. Moreover, we deploy ViMo to enhance agent-focused tasks by predicting the outcome of different action options. Experiments show ViMo's ability to generate visually plausible and functionally effective GUIs that enable App agents to make more informed decisions.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2504.13936

Country: Europe (0.28)

Genre:

Research Report (0.82)
Workflow (0.68)

Industry: Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.97)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

ScreenLLM: Stateful Screen Schema for Efficient Action Understanding and Prediction

Jin, Yiqiao, Petrangeli, Stefano, Shen, Yu, Wu, Gang

arXiv.org Artificial IntelligenceMar-26-2025

Graphical User Interface (GUI) agents are autonomous systems that interpret and generate actions, enabling intelligent user assistance and automation. Effective training of these agent presents unique challenges, such as sparsity in supervision signals, scalability for large datasets, and the need for nuanced user understanding. We propose stateful screen schema, an efficient representation of GUI interactions that captures key user actions and intentions over time. Building on this foundation, we introduce ScreenLLM, a set of multimodal large language models (MLLMs) tailored for advanced UI understanding and action prediction. Extensive experiments on both open-source and proprietary models show that ScreenLLM accurately models user behavior and predicts actions. Our work lays the foundation for scalable, robust, and intelligent GUI agents that enhance user interaction in diverse software environments.

large language model, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

doi: 10.1145/3701716.3718379

2503.20978

Country:

Oceania > Australia > New South Wales > Sydney (0.05)
North America > United States > California > Santa Clara County > San Jose (0.05)
North America > United States > New York > New York County > New York City (0.04)
North America > United States > Georgia > Fulton County > Atlanta (0.04)

Genre: Research Report > New Finding (0.46)

Industry: Education > Educational Technology (0.68)

Technology:

Information Technology > Human Computer Interaction (1.00)
Information Technology > Graphics (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(3 more...)

Add feedback

Query-based versus resource-based cache strategies in tag-based browsing systems

Gayoso-Cabada, Joaquín, Gómez-Albarrán, Mercedes, Sierra, José-Luis

arXiv.org Artificial IntelligenceJan-26-2025

Tag-based browsing is a popular interaction model for navigating digital libraries. According to this model, users select descriptive tags to filter resources in the collections. Typical implementations of the model are based on inverted indexes. However, these implementations can require a considerable amount of set operations to update the browsing state. To palliate this inconven-ience, it is possible to adopt suitable cache strategies. In this paper we describe and compare two of these strategies: (i) a query-based strategy, according to which previously computed browsing states are indexed by sets of selected tags; and (ii) a resource-based strategy, according to which browsing states are in-dexed by sets of filtered resources. Our comparison focused on runtime perfor-mance, and was carried out empirically, using a real-world web-based collec-tion in the field of digital humanities. The results obtained show that the re-source-based strategy clearly outperforms the query-based one.

artificial intelligence, information retrieval, natural language, (21 more...)

arXiv.org Artificial Intelligence

doi: 10.1007/978-3-030-04257-8_4

2501.15481

Country:

Europe > Spain > Galicia > Madrid (0.04)
North America > Panama (0.04)
Europe > Poland (0.04)

Genre: Research Report > Experimental Study (0.46)

Industry: Health & Medicine (0.46)

Technology:

Information Technology > Information Management > Search (1.00)
Information Technology > Communications > Social Media (0.68)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.46)

Add feedback